Locality-Sensitive Hashing for Document Similarity Detection

Exploring efficient methods to detect document similarities using Locality-Sensitive Hashing (LSH).

Project Overview

This project focuses on employing Locality-Sensitive Hashing (LSH) to detect document similarities efficiently within a dataset of over 1,500 paragraphs. Through the use of shingling, minhashing, and the banding technique, the project uncovers hidden patterns and pairs of similar documents.

Key Outcomes

Tools and Libraries

View the Code

Click the link below to view the full implementation and analysis on GitHub:

View on GitHub